Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaEuk] TheiaEuk ONT Workflow #644

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

Michal-Babins
Copy link
Contributor

@Michal-Babins Michal-Babins commented Oct 8, 2024

This PR closes #625

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

Creates a TheiaEuk workflow for ONT reads

⚡ Impacted Workflows/Tasks

  • New workflow: wf_theiaeuk_ont
  • Modified: wf_merlin_magic (to accommodate eukaryotic-specific analyses)
  • Modified: task_snippy_variants and task_snippy_gene_query (to work with ONT data)
  • Modified: wf_read_QC_trim_ont for theiaeuk specific workflows

This PR may lead to different results in pre-existing outputs: No

This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

⚙️ Algorithm

  • Included theiaeuk specific qc trimming happy path to wf_read_QC_trim_ont
  • Adapted Snippy for variant calling with ONT data
  • Included Candida Aruis ONT specific analyses in the Merlin Magic workflow

➡️ Inputs

The following inputs are mandatory for theiaeuk_ont

File read1
String samplename

The following inputs are optional for theiaeuk_ont

Int genome_length
String? assembler
String? assembler_options
Int dragonflye_cpu 
Int dragonflye_memory
Int dragonflye_disk_size
String medaka_model 
File gambit_db_genomes 
File gambit_db_signatures

⬅️ Outputs

# Version Capture
    String theiaeuk_ont_version = version_capture.phb_version
    String theiaeuk_ont_analysis_date = version_capture.date
    # Read QC outputs
    File read1_clean = read_qc.read1_clean
    String? nanoq_version = read_qc.nanoq_version
    Int est_genome_length = read_qc.est_genome_length
    # Assembly outputs
    File assembly_fasta = dragonflye.assembly_fasta
    File contigs_gfa = dragonflye.contigs_gfa
    String dragonflye_version = dragonflye.dragonflye_version
    # Read QC - nanoplot raw outputs
    File? nanoplot_html_raw = nanoplot_raw.nanoplot_html
    File? nanoplot_tsv_raw = nanoplot_raw.nanoplot_tsv
    Int? nanoplot_num_reads_raw1 = nanoplot_raw.num_reads
    Float? nanoplot_r1_median_readlength_raw = nanoplot_raw.median_readlength
    Float? nanoplot_r1_mean_readlength_raw = nanoplot_raw.mean_readlength
    Float? nanoplot_r1_stdev_readlength_raw = nanoplot_raw.stdev_readlength
    Float? nanoplot_r1_n50_raw = nanoplot_raw.n50
    Float? nanoplot_r1_mean_q_raw = nanoplot_raw.mean_q
    Float? nanoplot_r1_median_q_raw = nanoplot_raw.median_q
    Float? nanoplot_r1_est_coverage_raw = nanoplot_raw.est_coverage
    # Read QC - nanoplot clean outputs
    File? nanoplot_html_clean = nanoplot_clean.nanoplot_html
    File? nanoplot_tsv_clean = nanoplot_clean.nanoplot_tsv
    Int? nanoplot_num_reads_clean1 = nanoplot_clean.num_reads
    Float? nanoplot_r1_median_readlength_clean = nanoplot_clean.median_readlength
    Float? nanoplot_r1_mean_readlength_clean = nanoplot_clean.mean_readlength
    Float? nanoplot_r1_stdev_readlength_clean = nanoplot_clean.stdev_readlength
    Float? nanoplot_r1_n50_clean = nanoplot_clean.n50
    Float? nanoplot_r1_mean_q_clean = nanoplot_clean.mean_q
    Float? nanoplot_r1_median_q_clean = nanoplot_clean.median_q
    Float? nanoplot_r1_est_coverage_clean = nanoplot_clean.est_coverage
    # Read QC - nanoplot general outputs
    String? nanoplot_version = nanoplot_raw.nanoplot_version
    String? nanoplot_docker = nanoplot_raw.nanoplot_docker
    # Assembly QC - quast outputs
    File? quast_report = quast.quast_report
    String? quast_version = quast.version
    Int? assembly_length = quast.genome_length
    Int? number_contigs = quast.number_contigs
    Int? n50_value = quast.n50_value
    Float? quast_gc_percent = quast.gc_percent
    # Assembly QC - nanoplot outputs
    Float? est_coverage_raw = nanoplot_raw.est_coverage
    Float? est_coverage_clean = nanoplot_clean.est_coverage
    # Assembly QC - busco outputs
    String? busco_version = busco.busco_version
    String? busco_docker = busco.busco_docker
    String? busco_database = busco.busco_database
    String? busco_results = busco.busco_results
    File? busco_report = busco.busco_report
    # Gambit outputs
    File gambit_report_file = gambit.gambit_report_file
    File gambit_closest_genomes_file = gambit.gambit_closest_genomes_file
    String gambit_predicted_taxon = gambit.gambit_predicted_taxon
    String gambit_predicted_taxon_rank = gambit.gambit_predicted_taxon_rank
    String gambit_next_taxon = gambit.gambit_next_taxon
    String gambit_next_taxon_rank = gambit.gambit_next_taxon_rank
    String gambit_version = gambit.gambit_version
    String gambit_db_version = gambit.gambit_db_version
    String merlin_tag = gambit.merlin_tag
    String gambit_docker = gambit.gambit_docker
    # C. auris specific outputs for cladetyper
    String? clade_type = merlin_magic.clade_type
    String? cladetyper_analysis_date = merlin_magic.cladetyper_analysis_date
    String? cladetyper_version = merlin_magic.cladetyper_version
    String? cladetyper_docker_image = merlin_magic.cladetyper_docker_image
    String? cladetype_annotated_ref = merlin_magic.cladetype_annotated_ref
    # Snippy variants outputs
    String? snippy_variants_version = merlin_magic.snippy_variants_version
    String? snippy_variants_query = merlin_magic.snippy_variants_query
    String? snippy_variants_query_check = merlin_magic.snippy_variants_query_check
    String? snippy_variants_hits = merlin_magic.snippy_variants_hits
    String? snippy_variants_gene_query_results = merlin_magic.snippy_variants_gene_query_results
    String? snippy_variants_outdir_tarball = merlin_magic.snippy_variants_outdir_tarball
    String? snippy_variants_results = merlin_magic.snippy_variants_results
    String? snippy_variants_bam = merlin_magic.snippy_variants_bam
    String? snippy_variants_bai = merlin_magic.snippy_variants_bai
    String? snippy_variants_summary = merlin_magic.snippy_variants_summary
    String? snippy_variants_num_reads_aligned = merlin_magic.snippy_variants_num_reads_aligned
    String? snippy_variants_coverage_tsv = merlin_magic.snippy_variants_coverage_tsv
    String? snippy_variants_num_variants = merlin_magic.snippy_variants_num_variants
    String? snippy_variants_percent_ref_coverage = merlin_magic.snippy_variants_percent_ref_coverage

🧪 Testing

Suggested Scenarios for Reviewer to Test

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable (Theiagen developers only)

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@michellescribner
Copy link
Contributor

michellescribner commented Oct 30, 2024

I've launched some initial function tests here using ONT data from SRA for fungal pathogens and intend to explore how closely assembly lengths align with expected values.

Update: The samples tested all have either low coverage (<40X) or low mean base quality scores (<12). With that in mind, the de novo assemblies were often approximately the expected length for the species, but they showed many missing loci according to BUSCO (BUSCO completeness <60). High error rate is expected for these assemblies.

Higher quality/quantity read data may drastically improve the performance of TheiaEuk_ONT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New workflow] TheiaEuk_ONT
2 participants